- 
                Notifications
    You must be signed in to change notification settings 
- Fork 25.6k
Semantic_text match_all with Highlighter #128702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semantic_text match_all with Highlighter #128702
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice start, I've added some high level comments on overall organization. We can do a deeper review when it's closer to ready.
        
          
                .../java/org/elasticsearch/xpack/inference/queries/SemanticMatchAllQueryRewriteInterceptor.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                .../java/org/elasticsearch/xpack/inference/queries/SemanticMatchAllQueryRewriteInterceptor.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                ...src/main/java/org/elasticsearch/xpack/inference/queries/SemanticQueryRewriteInterceptor.java
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a much simpler way to fix this by updating the semantic highlighter. Take a look at how you can change QueryVisitor in extractDenseVectorQueries and extractSparseVectorQueries:  https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/highlight/SemanticTextHighlighter.java#L250
| @elasticmachine update branch | 
| Pinging @elastic/search-eng (Team:SearchOrg) | 
| Pinging @elastic/search-relevance (Team:Search - Relevance) | 
| Hi @Samiul-TheSoccerFan, I've created a changelog YAML for you. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for these changes, looks great @Samiul-TheSoccerFan !
        
          
                docs/changelog/128702.yaml
              
                Outdated
          
        
      | @@ -0,0 +1,5 @@ | |||
| pr: 128702 | |||
| summary: Highlighting support added to semantic_text with `match_all` | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Since this is a bug, perhaps we call the changelog summary something like `Fix issue where highlighting was not returned with match_all queries"
| queries.add(fieldType.createExactKnnQuery(VectorData.fromFloats(knnQuery.getTargetCopy()), null)); | ||
| } else if (query instanceof KnnByteVectorQuery knnQuery) { | ||
| queries.add(fieldType.createExactKnnQuery(VectorData.fromBytes(knnQuery.getTargetCopy()), null)); | ||
| } else if (query instanceof MatchAllDocsQuery) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😍 Nice optimization from the first solution!
| @elasticmachine update branch | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach looks good! Caught a copy/paste error in the tests that we should fix though.
        
          
                ...nce/src/yamlRestTest/resources/rest-api-spec/test/inference/90_semantic_text_highlighter.yml
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                ...src/yamlRestTest/resources/rest-api-spec/test/inference/90_semantic_text_highlighter_bwc.yml
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                ...nce/src/yamlRestTest/resources/rest-api-spec/test/inference/90_semantic_text_highlighter.yml
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                ...src/yamlRestTest/resources/rest-api-spec/test/inference/90_semantic_text_highlighter_bwc.yml
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Good clean solution and tests :)
| @elasticmachine update branch | 
| @elasticmachine update branch | 
| 💔 Backport failed
 You can use sqren/backport to manually backport by running  | 
| 💚 All backports created successfully
 Questions ?Please refer to the Backport tool documentation | 
* initial implementation for match_All * reformat * [CI] Auto commit changes from spotless * Excluding matchAllintercepter * Adding matchAllDocs support for vector fields * [CI] Auto commit changes from spotless * Remove previous implementation * Adding yaml tests for match_all * fixed yaml tests * Update docs/changelog/128702.yaml * Update changelog * changelog - update summary * Fix wrong inference names for the yaml tests --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: Elastic Machine <[email protected]> (cherry picked from commit d1b5532) # Conflicts: # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceFeatures.java
* initial implementation for match_All * reformat * [CI] Auto commit changes from spotless * Excluding matchAllintercepter * Adding matchAllDocs support for vector fields * [CI] Auto commit changes from spotless * Remove previous implementation * Adding yaml tests for match_all * fixed yaml tests * Update docs/changelog/128702.yaml * Update changelog * changelog - update summary * Fix wrong inference names for the yaml tests --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: Elastic Machine <[email protected]> (cherry picked from commit d1b5532) # Conflicts: # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceFeatures.java
* initial implementation for match_All * reformat * [CI] Auto commit changes from spotless * Excluding matchAllintercepter * Adding matchAllDocs support for vector fields * [CI] Auto commit changes from spotless * Remove previous implementation * Adding yaml tests for match_all * fixed yaml tests * Update docs/changelog/128702.yaml * Update changelog * changelog - update summary * Fix wrong inference names for the yaml tests --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: Elastic Machine <[email protected]> (cherry picked from commit d1b5532) # Conflicts: # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceFeatures.java # x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/90_semantic_text_highlighter.yml # x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/90_semantic_text_highlighter_bwc.yml
| 💚 All backports created successfully
 Questions ?Please refer to the Backport tool documentation | 
* initial implementation for match_All * reformat * [CI] Auto commit changes from spotless * Excluding matchAllintercepter * Adding matchAllDocs support for vector fields * [CI] Auto commit changes from spotless * Remove previous implementation * Adding yaml tests for match_all * fixed yaml tests * Update docs/changelog/128702.yaml * Update changelog * changelog - update summary * Fix wrong inference names for the yaml tests --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: Elastic Machine <[email protected]> (cherry picked from commit d1b5532) # Conflicts: # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceFeatures.java # x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/90_semantic_text_highlighter.yml # x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/90_semantic_text_highlighter_bwc.yml
* initial implementation for match_All * reformat * [CI] Auto commit changes from spotless * Excluding matchAllintercepter * Adding matchAllDocs support for vector fields * [CI] Auto commit changes from spotless * Remove previous implementation * Adding yaml tests for match_all * fixed yaml tests * Update docs/changelog/128702.yaml * Update changelog * changelog - update summary * Fix wrong inference names for the yaml tests --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: Elastic Machine <[email protected]> (cherry picked from commit d1b5532) # Conflicts: # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceFeatures.java # x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/90_semantic_text_highlighter.yml # x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/90_semantic_text_highlighter_bwc.yml
* initial implementation for match_All * reformat * [CI] Auto commit changes from spotless * Excluding matchAllintercepter * Adding matchAllDocs support for vector fields * [CI] Auto commit changes from spotless * Remove previous implementation * Adding yaml tests for match_all * fixed yaml tests * Update docs/changelog/128702.yaml * Update changelog * changelog - update summary * Fix wrong inference names for the yaml tests --------- Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: Elastic Machine <[email protected]> (cherry picked from commit d1b5532) # Conflicts: # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferenceFeatures.java # x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/90_semantic_text_highlighter.yml # x-pack/plugin/inference/src/yamlRestTest/resources/rest-api-spec/test/inference/90_semantic_text_highlighter_bwc.yml
This PR addresses the missing
highlightingfor semantic_text fields when amatch_allquery is used. We guide users to use semantic highlighting to view their chunks, but highlighting does not currently return any results withmatch_all. Highlighting is usually unnecessary for non-inference fields withmatch_all, but it's essential forsemantic_textfields to return the generated chunks. This change ensures those highlighting results are returned as expected.Test cases:
Only text fields
only inference fields
Both infer and non-inference fields
Both dense vector fields
Some additional cases